Advanced Tab

To connect to the GCS file system adapter, set the following properties in the basic tab of the New Data Source connection window:

 

Field

Description

Concurrent Request Limit

This configuration can take a value between 0 to 65536. It specifies the concurrency limits to be imposed on the underlying data source.

Default String Length

The default VARCHAR length.

Detect Partition During Introspection

Include this option to automatically detect partitions that the file might have.

Note that if they are not properly detected, both usability and performance will be adversely impacted

CSV Options

 

Include CSV Files

Check this option to include the delimited files from the storage area.

Character Set

The character set used by the datasource.

Delimiter

Indicates the file delimiter character.

Text Qualifier

Indicates the type of qualifier that is used in the file to enclose a string field.

Has Header Row

Indicates whether or not the file has a header row.

Infer Schema

Choosing this option enables the parser to infer the schema and datatypes of each column based on the data in the file.

Note: If this option is selected, it is recommended to provide a “sampling ratio” while introspecting the data source, where sampling of the data might be used when inferring the schema. Providing the sampling ratio helps reduce the overhead of not having to read all the rows while inferring the schema. Parquet files do not require schema inference as their schema is encoded in their metadata

CSV Escape Character

Indicates the character that should be ignored by the parser in the file.

CSV Parser Lib

The libraries used to parse the delimited files. The libraries supported currently are commons (default) and uniVocity. For more information, refer:

http://commons.apache.org/proper/commons-csv/
https://www.univocity.com/

CSV Parsing Mode

The various parsing modes used by the data source. Allowed values are “PERMISSIVE (include a malformed row), DROPMALFORMED (Drop bad rows), FAILFAST (Fail the introspection when a bad row is encountered).

CSV Comment Character

Indicates the character that is used as comment in the file.

CSV Null Value

Indicates what is considered a Null value in a row.

CSV File Name Filters

Indicates the file name extensions that are valid.

Parquet Options

 

Include Parquet Files

Check this option to include the parquet files from the storage area.

Binary as String

Check this option to read binary value as string.

INT96 as Timestamp

Check this option to read INT96 value as Timestamp.

Compression Codec

Parquet files are typically compressed. This setting controls the compression algorithm used to process them. For more information about the different options, refer

https://spark.apache.org/docs/2.4.3/sql-data-sources-parquet.html

Filter Push-Down

Controls whether a predicate specified in a WHERE clause in a SQL query will be pushed down to the Cloud File System data source.

Convert Metastore Controls whether to use the built-in Parquet reader and writer for Hive tables with the parquet storage format. By default, this is set to True.

Merge Schema

In case of partitioned files, choosing this option merges the data and creates a single schema that includes columns from all partitions.

Parquet File Name Filters

Indicates the file name extensions that are valid.